* CHANGE DIRECTORY to whatever directory you have downloaded the dataverse to (which is where this script should be), respecting the directory structure of the dataverse
* user written packages required: geodist

/*** WARNING ***
The data assembly scripts will OVERWRITE .dta files already in the Dataverse: data/EOIR_asylum/asylum_revised.dta, data/weather/NOAA_AQS.dta, and in data/original_article/reconstructed_data: hourlyweather_vargen_GMTadj_nogaps_HS.dta, matched_HS_corrections_stablestations.dta, and matched_HS_improved_corrections.dta.
In particular, a newly created asylum_revised.dta will almost certainly DIFFER slightly from asylum_revised.dta on Dataverse because of a minor bug described in asylum_data_cleaning_and_checking_AEJ-RR.do: the assembly script at one point drops all but the first of possibly several applications for the same ID, but for 115 applications (out of several millions), there is another application of the same date with the same ID (it would have been better to drop these 230 applications).
*/


* my data (from EOIR, NOAA, and AQS)

do data/EOIR_asylum/asylum_data_cleaning_and_checking_AEJ-RR.do // assembles asylum_revised.dta from raw EOIR csv files. It also performs various sanity checks, including check for problems with testers etc. (none to speak of) using problematic_judge_codes.dta. Unlike asylum_data_cleaning_and_checking.do, the AEJ-RR version of the script keeps decisions other than grant in the _revised data -- see script -- and creates asylum_revised.dta rather than asylum.dta
do data/weather/NOAA_AQS_prep.do // assembles NOAA_AQS.dta from semiraw NOAA/NOAA_6a4p.dta and NOAA/AQS_daily_optimized_coverage.dta, which are compilations of NOAA and EPA data created by R scripts documented in data/weather/download_scripts_and_helperfiles and NOT called from this master script. Note that NOAA_AQS_prep.do also creates other variables, such as deviations of temperature from a sine fit, that may be of interest for robustness checks. As a result, this script takes a few hours to run.

* reconstruction of Heyes & Saberian data from their raw data files (you first need to download their data from https://www.aeaweb.org/doi/10.1257/app.20170223.data and unzip it to data/original_article/data)

do data/original_article/reconstructed_data/hourlyweather__inc_clean_and_vargen__and_GMTadj_nogaps_HS.do // creates a file of 6am-4pm local time weather averages: hourlyweather_vargen_GMTadj_nogaps_HS.dta
do data/original_article/reconstructed_data/organize_HS_improved_corrections.do // creates the equivalent of Heyes & Saberian's matched.dta but correcting court locations and time zone adjustments: matched_HS_improved_corrections.dta
do data/original_article/reconstructed_data/organize_HS_improved_corrections_stablestations.do // same, but using not the closest measurement station on any given day but for each city-year the station (within 20 miles) with the most complete coverage: matched_HS_improved_corrections_stablestations.dta